{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5d06b060",
   "metadata": {},
   "source": [
    "# Pandas\n",
    "\n",
    "## Introduction\n",
    "Pandas is an open-source python library which is bascially used for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.\n",
    "\n",
    "### Installation using pip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0db7ccfe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec9bbd19",
   "metadata": {},
   "source": [
    "We can use pandas by importing it in the file like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c7792003",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "231658a9",
   "metadata": {},
   "source": [
    "### Creating data\n",
    "\n",
    "\n",
    "There are two core concepts in the pandas: **DataFrame** and **Series**\n",
    "\n",
    "\n",
    "#### DataFrame\n",
    "\n",
    "A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.\n",
    "\n",
    "For example, consider the following simple DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "01049cbc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Likes</th>\n",
       "      <th>Dislikes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>130</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>121</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Likes  Dislikes\n",
       "0    130        11\n",
       "1    121         2"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'Likes': [130, 121], 'Dislikes': [11, 2]})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b31e5fb8",
   "metadata": {},
   "source": [
    "In this example, the \"0, Likes\" entry has the value of 130. The \"0, Dislikes\" entry has a value of 11, and so on.\n",
    "\n",
    "DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "142e0642",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Anonymous</th>\n",
       "      <th>Analyst</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>I liked it.</td>\n",
       "      <td>Looks good.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>It was great!</td>\n",
       "      <td>Informative</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Anonymous      Analyst\n",
       "0    I liked it.  Looks good.\n",
       "1  It was great!  Informative"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'Anonymous': ['I liked it.', 'It was great!'], 'Analyst': ['Looks good.', 'Informative']})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e20b8d6",
   "metadata": {},
   "source": [
    "We are using the `pd.DataFrame()` constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Anonymous and Analyst in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.\n",
    "\n",
    "\n",
    "The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.\n",
    "\n",
    "\n",
    "The list of row labels used in a DataFrame is known as an **Index**. We can assign values to it by using an `index` parameter in our constructor:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7086d7f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Anonymous</th>\n",
       "      <th>Analyst</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Product A</th>\n",
       "      <td>I liked it.</td>\n",
       "      <td>Looks good.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Product B</th>\n",
       "      <td>It was great!</td>\n",
       "      <td>Informative</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Anonymous      Analyst\n",
       "Product A    I liked it.  Looks good.\n",
       "Product B  It was great!  Informative"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'Anonymous': ['I liked it.', 'It was great!'], \n",
    "              'Analyst': ['Looks good.', 'Informative']},\n",
    "             index=['Product A', 'Product B'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d5a6409",
   "metadata": {},
   "source": [
    "#### Series\n",
    "\n",
    "A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2cb5cff6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "4    5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([1, 2, 3, 4, 5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cbecad5",
   "metadata": {},
   "source": [
    "A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an `index` parameter. However, a Series does not have a column name, it only has one overall `name`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "07354f95",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017 Sales    400\n",
       "2018 Sales    515\n",
       "2019 Sales    605\n",
       "Name: Product A, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([400, 515, 605], index=['2017 Sales', '2018 Sales', '2019 Sales'], name='Product A')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d045663",
   "metadata": {},
   "source": [
    "### Reading Data Files\n",
    "\n",
    "Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.\n",
    "\n",
    "Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:\n",
    "\n",
    "***\n",
    "Product A,Product B,Product C,  \n",
    "\n",
    "30,21,9,  \n",
    "\n",
    "35,34,1,  \n",
    "\n",
    "41,11,11\n",
    "\n",
    "***\n",
    "\n",
    "So a CSV file is a table of values separated by commas. Hence the name: \"Comma-Separated Values\", or CSV.\n",
    "\n",
    "\n",
    "Let's now try to read a very famous dataset, known as **Golf Dataset**. We'll use the `pd.read_csv()` function to read the data into a DataFrame. This goes thusly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "70962998",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Temp</th>\n",
       "      <th>Humidity</th>\n",
       "      <th>Windy</th>\n",
       "      <th>Play Golf</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Cool</td>\n",
       "      <td>Normal</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Cool</td>\n",
       "      <td>Normal</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Cool</td>\n",
       "      <td>Normal</td>\n",
       "      <td>True</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Cool</td>\n",
       "      <td>Normal</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Mild</td>\n",
       "      <td>Normal</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Mild</td>\n",
       "      <td>Normal</td>\n",
       "      <td>True</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>True</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Hot</td>\n",
       "      <td>Normal</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Outlook  Temp Humidity  Windy Play Golf\n",
       "0      Rainy   Hot     High  False        No\n",
       "1      Rainy   Hot     High   True        No\n",
       "2   Overcast   Hot     High  False       Yes\n",
       "3      Sunny  Mild     High  False       Yes\n",
       "4      Sunny  Cool   Normal  False       Yes\n",
       "5      Sunny  Cool   Normal   True        No\n",
       "6   Overcast  Cool   Normal   True       Yes\n",
       "7      Rainy  Mild     High  False        No\n",
       "8      Rainy  Cool   Normal  False       Yes\n",
       "9      Sunny  Mild   Normal  False       Yes\n",
       "10     Rainy  Mild   Normal   True       Yes\n",
       "11  Overcast  Mild     High   True       Yes\n",
       "12  Overcast   Hot   Normal  False       Yes\n",
       "13     Sunny  Mild     High   True        No"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset = pd.read_csv(\"./Data/Pandas/golf-dataset.csv\")\n",
    "\n",
    "golf_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "136f0352",
   "metadata": {},
   "source": [
    "##### ```shape```\n",
    "\n",
    "We can use the `shape` attribute to check how large the resulting DataFrame is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "2d5d6e44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(14, 5)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25729775",
   "metadata": {},
   "source": [
    "##### ```head```\n",
    "\n",
    "We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a345fcf8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Temp</th>\n",
       "      <th>Humidity</th>\n",
       "      <th>Windy</th>\n",
       "      <th>Play Golf</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Cool</td>\n",
       "      <td>Normal</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Outlook  Temp Humidity  Windy Play Golf\n",
       "0     Rainy   Hot     High  False        No\n",
       "1     Rainy   Hot     High   True        No\n",
       "2  Overcast   Hot     High  False       Yes\n",
       "3     Sunny  Mild     High  False       Yes\n",
       "4     Sunny  Cool   Normal  False       Yes"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba611a80",
   "metadata": {},
   "source": [
    "In Python, we can access the property of an object by accessing it as an attribute. A `book` object, for example, might have a `title` property, which we can access by calling `book.title`. Columns in a pandas DataFrame work in much the same way.\n",
    "\n",
    "Hence to access the `Temp` property of `golf_dataset` we can use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f0e68caf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Hot\n",
       "1      Hot\n",
       "2      Hot\n",
       "3     Mild\n",
       "4     Cool\n",
       "5     Cool\n",
       "6     Cool\n",
       "7     Mild\n",
       "8     Cool\n",
       "9     Mild\n",
       "10    Mild\n",
       "11    Mild\n",
       "12     Hot\n",
       "13    Mild\n",
       "Name: Temp, dtype: object"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.Temp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72a9fd1d",
   "metadata": {},
   "source": [
    "### Indexing in Pandas\n",
    "\n",
    "The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, loc and iloc. For more advanced operations, these are the ones you're supposed to be using.\n",
    "\n",
    "#### Index-based Selection\n",
    "\n",
    "Pandas indexing works in one of two paradigms. The first is **index-based selection**: selecting data based on its numerical position in the data. `iloc` follows this paradigm.\n",
    "\n",
    "To select the first row of data in a DataFrame, we may use the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9177287e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Outlook      Rainy\n",
       "Temp           Hot\n",
       "Humidity      High\n",
       "Windy        False\n",
       "Play Golf       No\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.iloc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21c3c586",
   "metadata": {},
   "source": [
    "Both `loc` and `iloc` are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.\n",
    "\n",
    "This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. To get a column with `iloc`, we can do the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "86769edd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        Rainy\n",
       "1        Rainy\n",
       "2     Overcast\n",
       "3        Sunny\n",
       "4        Sunny\n",
       "5        Sunny\n",
       "6     Overcast\n",
       "7        Rainy\n",
       "8        Rainy\n",
       "9        Sunny\n",
       "10       Rainy\n",
       "11    Overcast\n",
       "12    Overcast\n",
       "13       Sunny\n",
       "Name: Outlook, dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.iloc[:,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d38b616",
   "metadata": {},
   "source": [
    "#### Label-based Selection\n",
    "\n",
    "The second paradigm for attribute selection is the one followed by the `loc` operator: **label-based selection**. In this paradigm, it's the data index value, not its position, which matters.\n",
    "\n",
    "For example, to get the first entry in `golf_dataset`, we would now do the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "a75834ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Rainy'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.loc[0,'Outlook']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2438c75f",
   "metadata": {},
   "source": [
    "`iloc` is conceptually simpler than `loc` because it ignores the dataset's indices. When we use `iloc` we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. `loc`, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using `loc` instead.\n",
    "\n",
    "For example, here's one operation that's much easier using `loc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c2f3fba0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Temp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Hot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Overcast</td>\n",
       "      <td>Hot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sunny</td>\n",
       "      <td>Mild</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Outlook  Temp\n",
       "0     Rainy   Hot\n",
       "1     Rainy   Hot\n",
       "2  Overcast   Hot\n",
       "3     Sunny  Mild"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.loc[:3, ['Outlook', 'Temp']]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1035d0cf",
   "metadata": {},
   "source": [
    "### Summary functions in Pandas\n",
    "\n",
    "Pandas provides many simple \"summary functions\" (not an official name) which restructure the data in some useful way. For example, consider the `describe()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "53b9cdcd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Temp</th>\n",
       "      <th>Humidity</th>\n",
       "      <th>Windy</th>\n",
       "      <th>Play Golf</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Rainy</td>\n",
       "      <td>Mild</td>\n",
       "      <td>High</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Outlook  Temp Humidity  Windy Play Golf\n",
       "count       14    14       14     14        14\n",
       "unique       3     3        2      2         2\n",
       "top      Rainy  Mild     High  False       Yes\n",
       "freq         5     6        7      8         9"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ccf6afb",
   "metadata": {},
   "source": [
    "or we can find out summary for an individual attribute like for `Temp`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "87a3bc6b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count       14\n",
       "unique       3\n",
       "top       Mild\n",
       "freq         6\n",
       "Name: Temp, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.Temp.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2313f7e9",
   "metadata": {},
   "source": [
    "To see a list of unique values we can use the `unique()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9449c5ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Hot', 'Mild', 'Cool'], dtype=object)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.Temp.unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3517d7b7",
   "metadata": {},
   "source": [
    "### Dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ec6317e",
   "metadata": {},
   "source": [
    "The data type for a column in a DataFrame or a Series is known as the **dtype**.\n",
    "\n",
    "You can use the `dtype` property to grab the type of a specific column. For instance, we can get the dtype of the `Windy` column in the `golf_dataset` DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "36bd64f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('bool')"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.Windy.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86d79071",
   "metadata": {},
   "source": [
    "Alternatively, the `dtypes` property returns the dtype of every column in the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "171ce1c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Outlook      object\n",
       "Temp         object\n",
       "Humidity     object\n",
       "Windy          bool\n",
       "Play Golf    object\n",
       "dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "golf_dataset.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "443cc4ac",
   "metadata": {},
   "source": [
    "## Further Readings\n",
    "\n",
    "We've seen the most commonly used functions of Pandas that are used in the field of Machine Learning and Data Analysis. However there is a variety of other functions that Pandas provides us.\n",
    "\n",
    "For further understanding, or to explore more you may refer to the official documentation of Panadas at:\n",
    "\n",
    "https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}